Sub-Sampled Newton Methods I: Globally Convergent Algorithms
نویسندگان
چکیده
Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms, i.e., those that use Hessian as well as gradient information, and we provide bounds on the convergence of the variants of Newton’s method that incorporate uniform sub-sampling as a means to estimate the gradient and/or Hessian. Our bounds are non-asymptotic, i.e., they hold for finite number of data points in finite dimensions for finite number of iterations. In addition, they are quantitative and depend on the quantities related to the problem, i.e., the condition number. However, our algorithms are global and are guaranteed to converge from any initial iterate. Using random matrix concentration inequalities, one can sub-sample the Hessian in a way that the curvature information is preserved. Our first algorithm incorporates such sub-sampled Hessian while using the full gradient. We also give additional convergence results for when the sub-sampled Hessian is regularized by modifying its spectrum or ridge-type regularization. Next, in addition to Hessian sub-sampling, we also consider sub-sampling the gradient as a way to further reduce the computational complexity per iteration. We use approximate matrix multiplication results from randomized numerical linear algebra (RandNLA) to obtain the proper sampling strategy. In all these algorithms, computing the update boils down to solving a large scale linear system, which can be computationally expensive. As a remedy, for all of our algorithms, we also give global convergence results for the case of inexact updates where such linear system is solved only approximately. This paper has a more advanced companion paper [40] in which we demonstrate that, by doing a finer-grained analysis, we can get problem-independent bounds for local convergence of these algorithms and explore tradeoffs to improve upon the basic results of the present paper. ∗International Computer Science Institute, Berkeley, CA 94704 and Department of Statistics, University of California at Berkeley, Berkeley, CA 94720. farbod/[email protected]. 1 ar X iv :1 60 1. 04 73 7v 3 [ m at h. O C ] 2 6 Fe b 20 16
منابع مشابه
Globally Convergent Newton Algorithms for Blind Decorrelation
This paper presents novel Newton algorithms for the blind adaptive decorrelation of real and complex processes. They are globally convergent and exhibit an interesting relationship with the natural gradient algorithm for blind decorrelation and the Goodall learning rule. Indeed, we show that these two later algorithms can be obtained from their Newton decorrelation versions when an exact matrix...
متن کاملA Class of Globally Convergent Algorithms for Pseudomonotone Variational Inequalities
We describe a fairly broad class of algorithms for solving variational inequalities, global convergence of which is based on the strategy of generating a hyperplane separating the current iterate from the solution set. The methods are shown to converge under very mild assumptions. Specifically, the problem mapping is only assumed to be continuous and pseudomonotone with respect to at least one ...
متن کاملEvolutionary Computing for Operating Point Analysis of Nonlinear Circuits
The DC operating point of an electronic circuit is conventionally found using the NewtonRaphson method. This method is not globally convergent and can only find one solution of the circuit at a time. In this paper, evolutionary computing methods, including Genetic Algorithms, Evolutionary Programming, Evolutionary Strategies and Differential Evolution are explored as possible alternatives to Ne...
متن کاملNesterov's Acceleration For Approximate Newton
Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if it is hard to approximate the Hessian well and efficiently. As far as we know, there is no effective way to handle this problem. In this paper, we resort to N...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1601.04737 شماره
صفحات -
تاریخ انتشار 2016